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Abstract: The Fast Multipole Method (FMM) provides a highly efficient compu- 
tational tool for solving constant coefficient partial differential equations (e.g. the 
Poisson equation) on infinite domains. The solution to such an equation is given 
as the convolution between a fundamental solution and the given data function, 
and the FMM is used to rapidly evaluate the sum resulting upon discretization 
, of the integral. This paper describes an analogous procedure for rapidly solving 

elliptic difference equations on infinite lattices. In particular, a fast summation 
D ■ technique for a discrete equivalent of the continuum fundamental solution is con- 

Q i structed. The asymptotic complexity of the proposed method is ©(A'^^ourcc)) where 

Q ' -/Vsource IS the number of points subject to body loads. This is in contrast to FFT 

' based methods which solve a lattice Poisson problem at a cost 0(A^n log iVn) 

independent of A'source, where Q is an artificial rectangular box containing the 
' loaded points and iV^ is the number of points in 0. 



1. Introduction 

This paper describes an efficient technique for solving Poisson problems defined on the 



^ ■ integer lattice Z^. For simplicity of presentation, we limit our attention to the equation 

(1.1) [Au]{m) = f{m), meZ^, 

CS| ', where / = f{m) and u = u{m) are scalar valued functions on Z^, and where A is the 

■ so-called discrete Laplace operator 

O' (1-2) [A n](m) = 4n(m) — u(m + ei) — n(m — ei) — n(m + 62) — n(m — 62), m^l?. 

^ '. 

Cf^ , In (1.2), ei = [1,0] and 62 = [0,1] are the canonical basis vectors in Z^. If / G L^(Z^) 

ly-^ I and X^^g22 \f{m)\ = 0, equation (1.1) is well-posed when coupled with a suitable decay 

O ' condition for u, see [19] for details. 

We are primarily interested in the situation where the given function / (the source) is 
supported at a finite number of points which we refer to as source locations, and where the 
function u (the potential) is sought at a finite number of points called target locations. While 
^ I the solution technique is described for the equation (1.1) involving the specific operator 

^ ■ (1-2), it may readily be extended to a broad range of lattice equations involving constant 

- - - coefficient elliptic difference operators. 

Variations of the equation (1.1) are perhaps best known as a set of equations associated 
with the discretization of elliptic partial differential equations. However, such equations 
also emerge in their own right as natural models in a broad range of applications: random 
walks [9] , analyzing the Ising model (in determining vibration modes of crystals) , and many 
others in engineering mechanics including micro-structural models, macroscopic models, 
simulating fractures [23, 16] and as models of periodic truss and frame structures [6, 25, 
19, 24]. 

Of particular interest in many of these applications is the situation where the lattice 
involves local deviations from perfect periodicity due to either broken links, or lattice in- 
clusions. The fast technique described in this paper can readily be modified to handle such 
situations, see Section 10.1. It may also be modified to handle equations defined on finite 
subsets of Z^, with appropriate conditions (Dirichlet / Neumann / periodic) prescribed on 
the boundary, see Section 10.2 and [10]. 
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The technique described is a descendant of the Fast Multipole Method (FMM) [12, 13, 
14], and, more specifically, of "kernel independent" FMMs [11, 21, 26]. A key application 
of the original FMM was to rapidly solve the Poisson equation 

(1.3) -Au{x) = f{x), a;eM^ 

which is the continuum analog of (1.1). The FMM exploits the fact that the analytic 
solution to (1.3) takes the form of a convolution 

(1-4) u{x) = 4>cont{x - y) f{y)dy, 

where (/>cont is the fundamental solution of the Laplace operator, 

(1-5) (pcoiA^) = -77- log 1^1- 

zvr 

If the source function / corresponds to a number of point charges {qj}jLi placed at locations 
{xj}jLi, and if the potential u is sought at same set of locations, then the convolution (1.4) 
simplifies to the sum 

N 

(1.6) Uj = y^0cont(a;i -Xj)qj, z = 1, 2, . . . , TV. 

i=i 

While direct evaluation of (1.6) requires 0{N^) operations since the kernel is dense, the 
FMM constructs an approximation to the potentials {n,,}^j^ in 0{N) operations. Any 
requested approximation error e can be attained, with the constant of proportionality in 
the 0{N) estimate depending only logarithmically on e. 

In the same way that the FMM can be said to rely on the fact that the Poisson equation 
(1.3) has the explicit analytic solution (1.4), the techniques described in this paper can 
be said to rely on the fact that the lattice Poisson equation (1.1) has an explicit analytic 
solution in the form 

(1.7) n(m) = [<p * f]{m) = ^ 0(m - n) /(n). 

where (/> is a fundamental solution for the discrete Laplace operator (1.2). This fundamental 
solution is known analytically [7, 19, 20, 10] via the normalized Fourier integral 



(1.8) (p{m) = -J—- / / , . 2/, /ON I ^ ■ 2/. /n^ dti dt2, m = [mi, 7712] G 



1 r r COs(tl7ni + t27T72) - 1 

(2 7r)2 4 sin2(ti/2) + 4 sm^{t2/2) 

This paper presents an adaptation of the original Fast Multipole Method that enables 
it to handle discrete kernels such as (1.8) and to exploit accelerations that are possible 
due the geometric restrictions present in the lattice case. The method extends directly to 
any problem that can be solved via convolution with a discrete fundamental solution. The 
technique for numerically evaluating (1.8) extends directly to other kernels, see Section 3. 

While we are not aware of any previously published techniques for rapidly solving the 
free space problem (1.1) (or, equivalently, for evaluating (1.7)), there exist very fast solvers 
for the closely related case of lattice Poisson equations defined on rectangular subsets of 7? 
with periodic boundary conditions. Such equations become diagonal when transformed to 
Fourier space, and may consequently be solved very rapidly via the FFT. The computational 
time Tfjt required by such a method satisfies 

(1-9) Tfft ~ A'^domain log A'domam aS Adomain OO, 
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where A'domain denotes the number of lattice nodes in the smallest rectangular domain 
holding all source locations, and where the constant of proportionality is very small. Similar 
complexity, sometimes without the logarithmic factor, and with fewer restrictions on the 
boundary conditions, may also be achieved via multigrid methods [3]. 

The principal contribution of the present work is that the computational time Tfmm 
required by the method described here has asymptotic complexity 



(1.10) 



as N^, 



oo, 



where A'sources denotes the number of lattice nodes that are loaded (assuming that the 
solution is sought only at the source points). In a situation where the source points are 
relatively densely distributed in a rectangle, we would have A'domain ~ -^sources and there 
would be no point in using the new method (in fact, an FFT based method is in this case 
significantly faster since the constant of proportionality in (1.9) is smaller than that in 
(1.10)). However, if the source and target points are relatively sparsely distributed in the 
lattice, then the estimate (1.10) of the new method is clearly superior to that of (1.9) for 
an FFT based method. As demonstrated in Section 9, very significant gains in speed can 
be achieved. Perhaps even more importantly, much larger problems can be handled since 
an FFT based method requires that the potential on all A'^domain nodes be held in memory. 

Example: The distinction between Adomain in (1-9) and A'sourccs in (1-10) can be illus- 
trated with the toy example shown in Figure 1.1. The figure illustrates a portion of an 
infinite lattice in which Agourcc = 11 nodes have been loaded. A rectangular domain 
covering these loads is marked with a blue dashed line and holds Adomain = 80 nodes. 
Clearly Agourccs = 11 ^ 80 = Adomain- A solution strategy for (1.1) based on the FFT 
or multigrid would involve all Adomain nodes inside the rectangle. In contrast, the lattice 
fundamental solution allows the solution task to be reduced to evaluating the sum (1.7) 
which involves an Agources ^ Agources dense coefficient matrix. 



Figure 1.1. A subset of the infinite lattice Z^. The Agources = H red 
nodes are loaded. The smallest rectangle holding all sources is marked with 
a dashed blue line. It has Adomain = 80 nodes. 



2. Review of fast summation techniques 

In this section, we briefly outline the basic ideas behind the Fast Multipole Method, 
and then describe the modifications required to evaluate a lattice sum such as (1.7). Our 



Figure 2.1. Geometry of the A^-body problem in Section 2. Source i is 
blue, the sources in J^^'^^ as defined by (2.3) are red. 

presentation assumes some familiarity with Fast Multipole Methods; for an introduction, 
see, e.g., [2, 13]. As a model problem, we consider the task of evaluating the sum 

N 

(2.1) Ui = ^(pixi - Xj)qj, 

i=i 

where {xi}^-^ is a set of N points in the plane, where {qi}iLi is a set of N given real 
numbers called charges, where {qi}^i is a set of N sought real numbers called potentials, 
and where <p : x — )• M is a kernel function. 

For simplicity, we consider in this review only the case where the sources are more or 
less uniformly distributed in a computational box in the sense that can be split into 
equi-sized small boxes, called leaves, in such a way that each small box holds about the 
same number of sources. We let A'^ieaf denote an upper bound for the number of sources 
held in any leaf. Then the sum (2.1) can be split into two parts 

where the near-field is defined by 

(2.2) (l>{^^-^j)qj, i = l,2,...,N, 

where J^^^^ is an index list marking all sources that lie either in the same box as charge i, 
or in a box that is directly adjacent to the box holding source i, 

(2.3) Jf^^^^ = {j : Xi and Xj are located in the same leaf or in directly adjacent leaves}. 
The definition of Jf'''^'^ is illustrated in Figure 2.1. The far- field is then defined by 

(2.4) u'r= cl){xi-Xj)qj, i = l,2,...,N, 

The near-field (2.2) can now be directly evaluated at low cost since at most 9 A'lcaf sources 
are near any given source. In the lattice case, this step could potentially be rendered 
expensive by the fact that the kernel is known only via the Fourier integral (1.8) which is 
quite costly to evaluate via quadrature. We describe in Section 3 how this step may be 
accelerated by pre-computing and storing the values of (j){m) for all small values of m and 
then using an asymptotic expansion for large m. We observe that the local evaluation gets 
particularly effective whenever the number of lattice cells along the side of any leaf box is 
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bounded by some fixed number L of moderate size (say L < 1000). In tliis case, there is in 
the lattice situation only 16 possible relative positions of two charges that are near each 
other which means that evaluation of the kernel for the near-field calculations amounts to 
simply a table lookup. (In fact, due to symmetries, only 2L^ values need to be stored.) 

The far- field (2.4) is as in the classical FMM evaluated via the computation of so-called 
multipole expansions and incoming expansions. These in turn are constructed via a hierar- 
chical procedure on a quad-tree such as the one shown in Figure 6.1. With the development 
of so-called kernel independent FMMs, the multipole expansions of the original FMM were 
superseded by more general representations valid for a broad range of kernels. The bulk of 
this paper consists of a description of such a kernel independent FMM, adapted to exploit 
geometrical restrictions imposed in the lattice case. Section 4 reviews a technique for com- 
pactly representing charges and potentials, and Section 5 describes how it can be adapted 
to the particular case of lattice equations. Section 6 introduces notation for handling quad- 
trees, Section 7 describes the so-called translation operators, then the full lattice FMM is 
described in Section 8. 



3. Evaluation of the lattice fundamental solution 

The numerical evaluation of the function in (1.8) requires some care since the integrand 
has a singularity at the origin and gets highly oscillatory when |m| is large. The latter 
issue can be handled quite easily since a highly accurate asymptotic expansion of (p{m) 
as |m| — 7- oo is known, see Section 3.1. When |m| is small, quadrature and Richardson 
extrapolation may be used to compute (p^m) to very high accuracy, see Section 3.2. We note 
that in the regime where |m| is small, there is only a limited number of possible values of 
m, and the corresponding values of (p{m) can be pre-computed and stored. Consequently, 
evaluating (j)[m) in the near-field amounts simply to a table look-up, which is very fast. 

3.1. Evaluation of fundamental solution for \m\ large. It has been established (see 
e.g. [7, 8, 18, 19, 20]) that as |m| — )• oo, the fundamental solution defined by (1.8) has 
the asymptotic expansion 

1 / log8\ 1 m\-Qmlml+m\ 
(3.1) 0(m) = -— log|m| +7H ^ + 



27r V 2 y 247r \m\= 

1 A3m^ - 772m'^m'^ + IbTOmimi - 772771"^ + A3m^ ^, 

H — ^ — -ir-^ — + 0(1/ mT). 

4807r |m|i2 ^ / ' ' ^ 

The number 7 is the Euler constant (7 = 0.577206 • • • ). 

For |m| large, we approximate (f) by dropping the 0(l/|m|^) term off the asymptotic 
expansion. We found that for |m| > 30 the expansion (3.1) is accurate to at least 10"^^. 

The asymptotic expansion (3.1) is valid for the simple square lattice only. However, 
there is a simple process for constructing analogous expansions for fundamental solutions 
associated with a very broad class of constant coefficient elliptic difference operators [20] . 
The process can be automated and executed using symbolic software such as Maple [19]. 

3.2. Evaluation of fundamental solution for |m| small. When |m| is small enough 
that the asymptotic expansion provides insufficient accuracy, we approximate the integral 
(1.8) using a two-step quadrature procedure: First, the domain [— vr, vr]^ is split into n x 
n equisized boxes where n is an odd number chosen so that each box holds about one 
oscillation of the integrand (in other words, n w |"^|)- For each box not containing the 
origin, the integral is approximated using a Cartesian Gaussian quadrature with 20 x 20 
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nodes. This leaves us with the task of evaluating the integral 
, . _ 1 r r cos(timi + t2m2) - 1 
^^"^ " (27r)2 y_J_, 4 sin2(ti/2) +4 sin2(t2/2 



where a = vr/n denotes the size of the center box. Now observe that 

(3.2) .(«) = g = g pl^p 4si„=(V2) + 4si„=(V2) ''''' 

where 

0„ = [2~" a, 2-" a]2\[2-"^i a, 2^"-^ a]^, n = 1, 2, 3, . . . 

is a sequence of annular domains whose union is the square [—a, a]^. All integrals in (3.2) 
involve non-singular integrands, and can easily be evaluated via Gaussian quadratures. (We 
split each Qn into eight rectangular regions and use a 20 x 20 point Gaussian quadrature 
on each.) Using Richardson extrapolation to accelerate the convergence, it turns out that 
only about 14 terms are needed to evaluate the sum (3.2) to a precision of 10~^^. 

Remark 3.1. The particular integral (1.8) can be evaluated via a short-cut since it is 
possible to evaluate the integral over ti analytically, and then use quadrature only for the 
resulting (non-singular) integral over t2, see [19]. Similar tricks are likely possible in many 
situations involving mono-atomic lattices. However, we prefer to not rely on this approach 
since it does not readily generalize to vector valued problems (such as those associated with 
mechanical lattice problems) or multi- atomic lattices. 

4. Outgoing and incoming expansions 

In this section, we present techniques for efficiently approximating the far- field uf^'^ to 
any given positive precision e. The parameter e can be tuned to balance the computational 
cost verses the accuracy. In the numerical examples reported in Section 9, e = 10"^*^. 

4.1. Interaction ranks. An essential component of the classical FMM is an efficient tech- 
nique for representing potentials and source distributions via "expansions" of different 
kinds. To illustrate the concept, let us consider a simplified problem in which a num- 
ber of sources are placed in a "source box" and the potential induced by these sources 
is to be evaluated at a number of locations in a "target box" 0,^. The orientation of the 
boxes is shown in Figure 4.1. To be precise, we suppose that sources {(?J}jLi are placed at 
locations {a;J}j^^ C and that we seek the potentials {uf induced at some locations 

N 

(4.1) < = J]cD«, a;J)gJ, i = 1, 2, . . . , M. 

In this review of the classical FMM, the kernel $ is defined by 

^{x,y) = <^cont(a; -y) = -^log 1^ - y\' 

where (/'cont is the fundamental solution of the Laplace equation. For convenience, we write 

(4.1) as a matrix- vector product 

(4.2) M'^ = A'^'^q^, 

where u'^ = {uf]^-^^ and = [(zJJjLu and where A'^''^ is the M x N matrix with entries 

A°,:- = <I>«,4). 



Figure 4.1. Illustration of source box and target box Oo-. 

A key observation underlying the FMM is that to any finite precision e, the rank of a 
matrix such as A"^'"^ is bounded independently of the numbers M and N of targets and 
sources in the two boxes. In fact, the e-rank P of A*^'"^ satisfies 

P<log(l/e), ase^O. 

The constant of proportionality depends on the geometry of the boxes, but is typically very 
modest. As a consequence of this rank deficiency, it is possible to factor the matrix A""'"^, 
say 

A'^'^ ^ B C, 
M X N M X P Px N 

and then to evaluate the potential in two steps: 

(4.4) V = Cq"^, u^Bv. 

The cost of evaluating u via (4.4) is 0{{M + N)P), which should be compared to the 
0{M N) cost of evaluating u via (4.2). 

4.2. Formal definitions of outgoing and incoming expansions. In the classical FMM, 
a "multipole expansion" for a box is a short vector from which the potential caused by all 
charges in the box can be evaluated; it can be viewed as a compressed representation of 
all the charges inside the box. In this section, we introduce the "outgoing expansion" 
as a generalization of this idea that allows representations other than classical multipole 
expansions to be incorporated. The "incoming expansion" is analogously introduced to 
generalize the concept of a "local expansion." 

Well-separated boxes: Let be a box with side length 2a and center c as shown in Figure 
4.2. We say that a point x is well-separated from Q if it lies outside the square of side 
length 6a centered at c. We say that two boxes Q and f2' are well-separated if every point 
in 0' is well-separated from Q, and vice versa. 

Outgoing expansion: Let be a box containing a set of sources. We say that a vector q is 
an outgoing expansion for if the potential caused by the sources in can be reconstructed 
from q to within precision e at any point that is well-separated from $7. 

Incoming expansion: Let be a box in which a potential has been induced by a set of 
sources located at points that are well-separated from We say that a vector u is an 
incoming expansion for Q if u can be reconstructed from u to within precision e. 



(4.3) 
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Figure 4.2. Illustration of well- separated points. Any point on or outside 
of the dashed square is well- separated from VL. Consequently, the points 
0^3, and are well-separated from Vt, but xi is not. 

4.3. Charge basis. The cost of computing a factorization such as (4.3) using a generic lin- 
ear algebraic technique such as QR is O(MiVP), which would negate any savings obtained 
when evaluating the matrix-vector product (unless a very large number of matrix-vector 
products involving the same source and target locations is required). Fortunately, it is 
possible in many environments to construct such factorizations much faster. The classical 
FMM uses multipole expansions. As an alternative, an approach based on so-called "proxy 
charges" has recently been developed [26]. It has been demonstrated [21, 22] that for any 
given box il, it is possible to find a set of locations Y = {yp]p^i C ^ with the property 
that sources placed at these points can to high accuracy replicate any potential caused by 
a source distribution in il. The number of points P required is given in Table 1. To be 
precise, given any set of points Y = {yj}f^i C and any sources q = {qj}f^i, we can find 
"equivalent charges" q = {qp}p=i such that 

N P 

(4.5) ^(/>cont(iC - yj)qj ~^(/>cont(iC - yp)qp, 

j=l p=l 

whenever x is well-separated from Q. The approximation (4.5) holds to some preset (rel- 
ative) precision e. Moreover, the map from q to q is linear, and there exists a matrix 
Tofs = Tofs(l", Y) such that 

(4.6) q = Tofs q, 

where "ofs" is an abbreviation of "outgoing [expansion] from sources." 





e 






10-s 




10-13 


1/32 


19 


27 


37 


49 


1/16 


19 


27 


36 


49 


1/4 


19 


28 


37 


49 


1/2 


21 


29 


37 


51 



Table 1 . The number of points P required to replicate the field to accuracy 
e for a box il. with side length I. 



We say that the points {yp}p=i form an outgoing skeleton for $7, and that the vector q 
is an outgoing expansion of 0. 
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In addition, we can find an incoming skeleton X = {ip}^]^ C with the property that 
any incoming potential in 0, can be interpolated from its values on the incoming skeleton. To 
be precise, suppose that U = U{x) is a potential caused by sources that are well-separated 
from Q, and that X = {xi}f£i is an arbitrary set of points in Q. Then there exists a matrix 
Ttfi = Ttfi{X , X) ('tfi" stands for "targets from incoming [expansion]") such that 

u = Ttfi u, 

where 

u = [U{xi)]fi-i^, and u = [[/(ip)]p=i. 

When the kernel $ is symmetric in the sense that ^{x — y) = ^{y — x) for all x and 
y, any outgoing skeleton is also an incoming skeleton, 

X = Y. 

Moreover, if the target points equal the source points so that X = Y , then 

Ttfi = (Tofs)* . 

Applied to the situation described in Section 4.1, where a set of sources were placed in 
a source box fi,-, and we sought to evaluate the potential induced at a set of target points 
in a box the claims of this section can be summarized by saying that A'^''^ admits an 
approximate factorization 

^ ' tfi ' ifo ' ofs 

MxiV M xP Px P Px N 
where the middle factor is simply a subsampling of the original kernel function 

Remark 4.1. For solving multiple problems involving different source and load distribu- 
tions that involve the same kernel, one set of skeleton points may be used for all problems 
by choosing the skeleton points to lie on the boundary of and Qr- The interpolation 
matrices Ttfi and Tofs need be constructed for each unique set of source and load distri- 
butions using the techniques from [5]. In Section 5, we describe this generalization of the 
skeletonization process in more detail for the lattice fundamental solution. 

5. Constructing charge bases for the lattice fundamental solution 

In this section, we describe how to construct the charge bases for the lattice fundamental 
solution defined by (1.8). 

From potential theory, we know that to capture the interaction between a set of source 
points {^TiJ in box 0,- and all points far from Qr, it is enough to capture the interaction 
between the source points and a set of "proxy" points F that lie densely on the boundary 
of a box that is concentric to and has a boundary that is well-separated from Q-j-- 

We choose the skeleton points to be a subset of the set of all points Y that lie on the 
boundary of box r. Either rank revealing QR factorization [15] or factorization techniques 
from [5] are applied to the matrix A'^'^ (whose entries are given by A^ ^ = ^{mf — mj)) 

to determine the rank P and which P points make up the set of skeleton points Y 0,r- 
Using the skeleton points and the techniques from [5], we find the P x N matrix Tofs 
such that 

(5.1) ||A^'--A^'^Tofs||<e. 

We use a similar technique to find the incoming skeleton points and the translation 
operator Ttfi. 
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Figure 6.1. A binary tree with 4 levels of uniform refinement. 

Remark 5.1. Because of the smoothness of the kernel, it is not required to use all the 
points on the boundary of the well-separated box as "proxy" points. We found it is enough 
to take 40 points per edge to approximate the far field with accuracy 10~^^. 

6. Tree structure 

The separation of variables in the kernel that was described in Section 4 is all that 
is needed to effectively evaluate a potential field whenever the set of target locations is 
well-separated from the set of source locations. When the two sets coincide, we need 
to tessellate the box containing them into smaller boxes, and use the expansion only for 
interactions between boxes that are well-separated. In this section, we describe the simplest 
such tessellation. 

Suppose that we are given a set of points {xi}^^ in a box 0. Given an integer A'leaf, 
we pick the smallest integer L such that when the box O is split into equisized smaller 
boxes, no box holds more than A'^icaf points. These equisized small boxes form the leaves 
of the tree. We merge the leaves by sets of four to form boxes of twice the side-length, 
and then continue merging by pairs until we recover the original box J7, which we call the 
root. 

The set consisting of all boxes of the same size forms what we call a level. We label 
the levels using the integer £ = 0, 1, 2, . . . , L, with i = denoting the root, and £ = L 
denoting the leaves. See Figure 6.1. 

Definition 6.1. Let r be a box in a hierarchical tree. 

• The parent of r is the box on the next coarser level that contains r. 
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• The children of r is the set Cf^ of boxes whose parent is r. 

• The neighbors of r is the set C^^^^ of boxes that are on the same level as r and are 
directly adjacent to it. 

• The interaction list of r is the set >C™* of all boxes a such that: 

(1) a and r are on the same level. 

(2) a and r are not directly adjacent. 

(3) The parents of a and r are directly adjacent. 



Example: For the tree shown in Figure 6.1, we have, e.g., 

C^f = {22, 24, 25, 26, 28}, 

= {36, 37, 48, 58, 60, 61, 70, 72}, 
4""' = {11,13,14: 21}, 

4"* = {22: 29, 30: 33, 38: 41, 47, 49, 54: 57, 60, 61, 71, 72, 73}. 

For the moment, we are assuming that the given point distribution is sufficiently uniform 
that all the leaves hold roughly the same number of points. In this case, 

L ~ log-rp . 

For non-uniform distributions of points, a uniform subdivision of 0, into 4^ boxes of equal 
length would be inefficient since many of the leaves would hold few or no points. In such 
cases, adaptive subdivisions should be used [4]. 



7. Translation operators 

In the FMM, five different so-called translation operators that construct outgoing or 
incoming expansions are required. We will, in this section, describe how to construct them, 
but we first list which operators we need: 

"'"ofs outgoing from sources translation operator: Let r denote a box holding a set of 
sources whose values are listed in the vector q^. The outgoing expansion of r is 
then constructed via 



"'"ofo -^^^ outgoing from outgoing translation operator: Suppose that a child cr of a box r 
holds a source distribution represented by the outgoing expansion q^ . The far-field 
caused by these sources can equivalently be represented by an outgoing representa- 
tion q^ of the parent, constructed via 



'^lio incoming from outgoing translation operator: Suppose that r and a are two 

well-separated boxes, and that a holds a source distribution represented by an out- 
going expansion q"^ . Then the field in r caused by these sources can be represented 
by an incoming expansion ii^ that is constructed via 
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The incoming from incoming translation operator: Suppose that r is the parent of 
a box (T. Suppose further that the incoming expansion represents a potential in 
T caused by sources that are ah well-separated from r. Then these sources are also 
well-separated from a, and the potential in a can be represented via an incoming 
expansion w'^ given by 

T^fj The targets from incoming translation operator: Suppose that r is a box whose in- 
coming potential is represented via the incoming representation ii^ . Then the po- 
tential at the actual target points are constructed via 

Techniques for constructing the matrix T^j^ were described in Section 5. Since in our 
case, the kernel is symmetric (i.e. (j){x — y) = 4>{y — x) for all x and y), these techniques 
immediately give us the targets-from-incoming translation operator as well, since 

Ttia = (Tofs) • 

We next observe that when charge b used, the outgoing-to-incoming translation 

operator is simply a sampling of the kernel function, 

"•"ifo.pg = ^^^P ~ ^9)' P, g = 1, 2, 3, . . . , P, 

where {xj}f^i and {Xj}j'^j^ are the locations of the skeleton points of r and a, respectively. 
All that remains is to construct T^J-^ and T|g^. In fact, since the kernel is symmetric, 

'ifi — {' oioj ' 

and all that actually remains is to construct the matrices T^J-q. To this end, let {aj}'^]^ 
denote children of box r. The construction of T^J-|^' closely resembles the construction of 
the Tofs operator described in Section 5. Instead of choosing the skeleton points from the 
set of all points on the boundary of r as was done in the construction of Tofs, we choose the 
skeleton points for r to be a subset of the skeleton points of its children, Y = \Y'^^ , ■ ■ ■ , Y'^'-]. 
As in Section 5, we define a set of "proxy" points F that are well-separated from r and use 
a factorization technique such as rank revealing QR to determine which points in Y make 
up the set of skeleton points Y. Using the techniques from [5], we find the interpolation 
matrix S such that 

IIA^.^'-A^'^SII <6. 

The translation operator T^J-^^ is then defined via T^J-q^ = S(:, 1 : ki) where ki is the number 
of skeleton points of ai, T^^^^ = S(:, {ki + 1) : {ki + ^2)) where k2 is the number of skeleton 
points of CJ2, etc. 

8. A LATTICE Fast Multipole Method 

While the classical FMM derives so-called "translation operators" based on asymptotic 
expansions of the kernel function, the method we propose determines these operators com- 
putationally. In this regard, it is similar to "kernel independent FMMs" such as [1, 17, 26]. 
Since the kernel is translation invariant, the computations need be carried out only for a sin- 
gle box on each level. Thus the construction of the translation operators is very inexpensive 
(less than linear complexity). 
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8.1. Precomputing skeletons and translation operators. For each level /, we define 
a "model" box which is centered at the origin and has the same size as the boxes on level 
I. The skeleton points and the translation operators are found with respect to the model 
box. 

To illustrate the concept, suppose that we are given a source / that is non-zero set of 
points {mj}^^ in a box Q. We seek the potential at the source points. 
The pre-computation consist of the following steps: 

(1) Divide $7 into the tree structure as described in Section 6. 

(2) Construct the lists described in Section 6. 

(3) Construct the skeleton points, Tofo, and Tifi translation operators. At the lowest 
level L, we construct the skeleton points for the level L model box using the proce- 
dure described in Section 5. For each level i < L, we take four copies of the skeleton 
points for level i + 1 shifting them so that each copy makes up one quadrant of the 
model box for level i. The skeleton points and the translation operators Tofo and 
Tifi are constructed using the technique described in Section 7. 

(4) Construct the Tifo translation operators. For each level i > 1, we construct the 
Tifo translation operators for the model box. We assume that the model box is 
completely surrounded with boxes such that the interaction list has the maximum 
number of boxes possible which is 42. Let Y be the outgoing skeleton points and 
X he the incoming skeleton points for the model box on level i. For each j < 42, 
we shift X to be centered at the j^^ possible location for a box on the interaction 
list and define 

(8.1) %^ = A^'^ 

Remark 8.1. In computing the sum, described in Section 8.2, it is easy to use the pre- 
computed translation operators. For example, given a box r that has a box a on the 
interaction list, we identify which j location a is in relative to r and define T"^'"^ = Tjj-^. 

Remark 8.2. For leaf boxes of size less than 8 x 8 on level /, we utilize the fact that 
there are a finite number of points inside the box that are also in 7? and construct the 
translation operator T|jfg for the model box assuming the source points are dense. For each 
box T on level / with A^"^ sources, we construct an index vector J"^ that notes the locations 
of the sources {mJlj^Tj^ in the dense lattice. We define T'^^^ = T^^fgi'-, J'^)- The translation 
operator T^g is constructed in a similar manner. 

8.2. Application. We have now assembled the tools for computing the sum (1.7) through 
two passes through the hierarchical tree; one upwards, and one downwards. 

(1) Sweep over all leaf boxes r. For each box, construct its outgoing representation 
from the values of the sources inside it: 

(2) Sweep over all non-leaf boxes r, going from finer to coarser levels. Merge the 
outgoing expansions of the children to construct the outgoing expansion for r, 

1=2^ T^fo^ • 
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(3) Loop over all boxes r. For each box, collect the contributions to its incoming 
expansion from boxes in its interaction list: 

(4) Loop over all parent boxes r, going from coarser levels to finer. For each box r, 
loop over all children cr of r, and broadcast the the incoming expansion of r to the 
incoming expansions of a: 

(5) Sweep over all leaf nodes r. For each node, form the potential by evaluating 
the incoming representation and directly adding the contributions from the sources 
inside r and in all boxes that are not well-separated from r: 

u- = u{r) = jl^u- + A{j\r)q{r)+ A{r,r)q{r). 

8.3. Asymptotic complexity of the proposed scheme. Since the kernel (1.8) is sepa- 
rable, the cost of computing the skeleton points and the translation operators on any level 
of the quad-tree is 0{P M \F\) where P is the number of skeleton points, M is the number 
of points on the boundary the box, and \F\ is the number of well-separated proxy nodes 
used [5]. The cost of solving the least squares problem (5.1) to find the matrix Tofs for a 
leaf box is 0{P^ \F\ -\- N P \F\) where N is the number of loaded points in the box. Hence, 
the total complexity of the lattice FMM is 0(-/Vsource)- 

Also notice that the memory needed to store the precomputed information is O(A'source)- 

9. Numerical examples 

In this section, we show that the lattice FMM speed compares favorably to FFT based 
techniques except for situations where the source points populate the majority of some 
rectangular computational box. We also show that the amount of memory required scales 
linearly with the number of source terms. 

All experiments are run on a Dell desktop computer with 2GB of RAM and an Intel 
Pentium 4 3.4GHz dual processor. The method was run at a requested relative precision 
of lO"^'^. The techniques were implemented rather crudely in Matlab, which means that 
significant further gains in speed should be achievable. 

We consider the lattice Poisson problem 

(9.1) [Au]im) = f{m), 

where the points where /(m) is non-zero are confined to an n x n square subdomain Q of 
7?. The FFT produces a slightly different solution than the lattice FMM since it enforces 
periodic boundary conditions, but this is not important for our purposes. We suppose 
throughout that n is a power of two to make the comparison as favorable to the FFT as 
possible. We let Tgt denote the time required by the FFT, and Tfmm the time for the 
FMM. 

In the first experiment, we suppose that every node in the lattice is loaded, see Figure 
9.1(a), so that Asourcc = Adomain = f^^- In this case, we expect 

Tfft ~ log(n), and Tfmm ~ n^, 
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and the purpose of the numerical experiment is simply to see how the constants of propor- 
tionality compare. Figure 9.2(a) provides the answer. We see that the FMM is slower by 
roughly two order of magnitude. 



(a) Dense (b) Random (c) Embedded Circle 

Figure 9.1. Illustration of load distributions for the three experiments. 
Red dots are the source points. 




(a) (b) 

Figure 9.2. Computational profile for a dense source distribution in a 
n X n domain. Computational times using the lattice FMM and FFT are 
reported (a). The memory M (in KB) per source point (M/A'sourcc) used in 
storing the precomputed information for the lattice FMM are reported (b). 

In the next three experiments, we suppose that / is only sparsely supported in the domain 
Q, so that A^'sourcc ^ -^domain- In this case, we expect 

Tfft ~ log(n), and TpMM ~ A'source- 

In the second experiment, we suppose that n loads are distributed according to a uniform 
random distribution throughout the domain, see Figure 9.1(b). Figure 9.3(a) provides the 
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measured times. It confirms our expectation that Tfmm does not depend on A^'domaim and 
indeed, that the FMM can handle a situation with n = 10^ loaded nodes in a domain 
involving A'domain = 10^^ lattice nodes. Figure 9.3(b) illustrates the memory (in KB) 
per source point (M/A'^gource) required for storing the pre-computation information. It 
confirms our expectation that the memory (in KB) required for storing the pre-computation 
information depends linearly with respect to A'^gourcc- 

In the third experiment, we distribute the load on a circle inscribed in the square 0, see 
Figure 9.1(c), in such a way that A'^gource = a nodes are loaded, for a = 1, 1/4, 1/16, 1/64. 
Figure 9.4(a) provides the time measurements and again confirms our expectation that the 
2fmm is not dependent on A'domain- 

In the final experiment, we fix the domain to be sized 2048 x 2048 and increase the number 
of body loads distributed according to a uniform distribution. Figure 9.5(a) provides the 
time measurements in comparison with the FFT. It illustrates that for sources occupying 
less than 0.39% of the domain (corresponding to 16, 384 sources) the lattice FMM is the 
faster method. 





Figure 9.3. Computational profile for a n source points distributed via 
uniform random distribution in a n x n domain. Computational times using 
the lattice FMM are reported (a). The memory M (in KB) per source point 
(M/Nsource) uscd in storing the precomputed information for the lattice 
FMM are reported (b). 
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-B- a = 1/4 

a = 1/16 
^ a = 1/64 









lO'^ 10' 10" lo'^ 

n 

(b) 

Figure 9.4. Computational profile for an sources lie on a circle embedded 
in an nxn domain. Computational times using the lattice FMM are reported 
(a). The memory M (in KB) per source point (M/Nsource) used in storing 
the precomputed information for the lattice FMM are reported (b). 



10 



Random sources 
FFT 

10-^ Nsou rcc 



-10 




10 



10 



10 ■ 



•ce /-^domain 

(a) 
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Figure 9.5. Computational profile for a fixed 2048 x 2048 lattice domain. 
Computational times using the lattice FMM and the FFT are reported (a). 
The memory M (in KB) per source point (M/A'source) used in storing the 
precomputed information for the lattice FMM are reported (b). 
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Figure 10.1. A piece of an infinite lattice with some deviations from perfect 
periodicity. One bar has been added, three bars have been removed, and two 
bars have been strengthened. The set Qinc of effected nodes has 11 elements, 
which are marked with white circles. 



10. Extensions 

10.1. Lattices with inclusions. The lattice FMM described in this paper can be used to 
handle many lattices featuring local deviations from perfect periodicity due to, e.g., missing 
bars or lattice inclusions, see Figure 10.1. To illustrate, let us consider the equation 

(10.1) [(A + B)ii](m) = 0, meZ^ 

where A is the discrete Laplace operator (1.2), and where B is an operator corresponding to 
some local modifications to the lattice, such as those illustrated in Figure 10.1. A typical 
far-field condition of interest is to require that the potential u approaches a linear function, 
say v{m) = cirrii + C2m2 where ci and C2 are given constants, at infinity. Formally, we 
require 

(10.2) lim \u{m) - v{m)\ = 0. 

|m|— >oo 

(In what follows, any function v satisfying Av = would work.) When we have access to 
the solution operator S, defined via 



[Sn](m) = (p{m — n)u{n), 



then the equation (10.1), which is a sparse equation defined on the infinite lattice Z^, can 
be transformed to a dense equation defined only on the set Oinc of nodes which connect to 
bars whose conductivity has been changed, or to which new bars have been connected. 
To execute the reduction of (10.1) to an equation defined on Clinc, set 

U = V + w, 

and observe that since A-y = and (A + B)u = 0, the new unknown w must satisfy 

(10.3) {A + B)w = -Bv. 

Since lim|^|_^o53|u;(m)| = 0, we have S Aw = w, and so application of BS to (10.3) results 
in the equation 

(10.4) fi + BSfi = -BSBv, 
where 

IJ, = Bw. 
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The key observation is now that B is a local operator on r^ino so equation (10.4) can be 
restricted to i7inc to obtain a closed equation for /i. Once fi has been solved from this 
equation, the original potential u is recovered via 

u = V — S {Bv — fi). 

It has been demonstrated [10] that iterative solvers in many situations converge rapidly 
when used to solve a system such as (10.4). The dense coefficient matrix can rapidly be 
applied to a vector since B is a sparse operator, and since S is amenable to the lattice FMM 
described in this paper. 

10.2. Finite lattice problems involving boundary conditions. The lattice FMM de- 
scribed in this paper can also be used in a solution technique for boundary value problems. 
To illustrate, let us consider the Dirichlet problem 

( \Au](m) = 0, men, 

(10.5) r N 

I u[m) = g{m), m G F. 

When we have access to the solution operator D, defined 



(10.6) [Du\{m) = dn4>{m — n)q{n) 

where the subscript n in dn simply indicates that the difference operator is acting on the 
variable n, then the equation (10.5) can be reduced to a problem defined on F. 
An equation for q is obtained by simply restricting (10.6) to F: 

(10.7) (i(m, n) g(n) = (7(m), m G F. 

It has been shown in [10] that there exist an 0{Ny) inversion scheme for solving (10.7). 
The operator D can be applied rapidly to q using the lattice FMM described in this paper. 

Also in [10], the technique described in this section and in section 10.1 has been put 
together to create a fast solver for finite lattice problems involving boundary conditions 
and inclusions. A key tool for the fast solver is the lattice FMM described in this paper. 



11. Concluding remarks 

The paper presents a kernel independent FMM for solving Poisson problems defined on 
the integer lattice T?. For simplicity of presentation, we focused on equations involving the 
discrete Laplace operator. Techniques for evaluating the corresponding lattice fundamental 
solutions are presented. The complexity of the proposed method is O(A'sourcc) where A'source 
is the number of locations in subjected to body loads. 

Numerical experiments demonstrate that for problems where the body loads are sparsely 
distributed in a computational box the proposed method is faster and more robust than the 
FFT. For instance, it was demonstrated that using a standard desktop PC, a lattice Poisson 
equation on a lattice with A'domain = 10^^ nodes, of which A'sourcc = 10^ were loaded, was 
solved to ten digits of accuracy in three minutes. It should be noted that this problem is 
about six orders of magnitude larger than the largest Poisson problem that can be handled 
via the FFT. Also, it was demonstrated for a lattice Poisson problem in a domain with 
-^domain = 4, 194, 304 nodes, the lattice FMM is faster than the FFT when the number of 
loaded points is less than A^source = 16, 384. 
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